Search CORE

76 research outputs found

Pathway-Based Genomics Prediction using Generalized Elastic Net.

Author: Baertsch Robert
Carlin Daniel E
Paull Evan O
Sokolov Artem
Stuart Joshua M
Publication venue: eScholarship, University of California
Publication date: 01/03/2016
Field of study

We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare

Retrocopy contributions to the evolution of the human genome

Author: Baertsch Robert
Brosius Jürgen
Diekhans Mark
Haussler David
Kent W James
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Evolution via point mutations is a relatively slow process and is unlikely to completely explain the differences between primates and other mammals. By contrast, 45% of the human genome is composed of retroposed elements, many of which were inserted in the primate lineage. A subset of retroposed mRNAs (retrocopies) shows strong evidence of expression in primates, often yielding functional retrogenes. Results To identify and analyze the relatively recently evolved retrogenes, we carried out BLASTZ alignments of all human mRNAs against the human genome and scored a set of features indicative of retroposition. Of over 12,000 putative retrocopy-derived genes that arose mainly in the primate lineage, 726 with strong evidence of transcript expression were examined in detail. These mRNA retroposition events fall into three categories: I) 34 retrocopies and antisense retrocopies that added potential protein coding space and UTRs to existing genes; II) 682 complete retrocopy duplications inserted into new loci; and III) an unexpected set of 13 retrocopies that contributed out-of-frame, or antisense sequences in combination with other types of transposed elements (SINEs, LINEs, LTRs), even unannotated sequence to form potentially novel genes with no homologs outside primates. In addition to their presence in human, several of the gene candidates also had potentially viable ORFs in chimpanzee, orangutan, and rhesus macaque, underscoring their potential of function. Conclusion mRNA-derived retrocopies provide raw material for the evolution of genes in a wide variety of ways, duplicating and amending the protein coding region of existing genes as well as generating the potential for new protein coding space, or non-protein coding RNAs, by unexpected contributions out of frame, in reverse orientation, or from previously non-protein coding sequence.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The UCSC Archaeal Genome Browser

Author: Baertsch Robert
Lowe Todd M.
Pohl Andy
Pollard Katherine S.
Schneider Kevin L.
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

As more archaeal genomes are sequenced, effective research and analysis tools are needed to integrate the diverse information available for any given locus. The feature-rich UCSC Genome Browser, created originally to annotate the human genome, can be applied to any sequenced organism. We have created a UCSC Archaeal Genome Browser, available at , currently with 26 archaeal genomes. It displays G/C content, gene and operon annotation from multiple sources, sequence motifs (promoters and Shine-Dalgarno), microarray data, multi-genome alignments and protein conservation across phylogenetic and habitat categories. We encourage submission of new experimental and bioinformatic analysis from contributors. The purpose of this tool is to aid biological discovery and facilitate greater collaboration within the archaeal research community

CiteSeerX

Crossref

PubMed Central

Sustainability? Population Affluence Species Technology

Author: Baertsch Robert
Buckwalter Patrick
Embaye Tsege
Gormly Sherwin
Liggett Travis
Reinsch Sigrid
Trent Jonathan
Publication venue
Publication date: 04/01/2010
Field of study

Presentation on algae and sustainability of the earth. Discusses the Offshore Membrane Enclosures for Growing Algae (OMEGA)

NASA Technical Reports Server

GeneHub-GEPIS: digital expression profiling for normal and cancer tissues based on an integrated gene database

Author: Adams
Brentani
Chini
D'Agostino
Engstrom
Ferguson
Griffiths-Jones
Hishiki
Houde
Kent
Kim
Lawrence S. Hon
Robert Baertsch
Scheurle
Shiuh-Ming Luoh
Smalheiser
Thomson
Uenishi
William I. Wood
Wu
Wu
Yan Zhang
Zemin Zhang
Zhang
Zhang
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

GeneHub-GEPIS is a web application that performs digital expression analysis in human and mouse tissues based on an integrated gene database. Using aggregated expressed sequence tag (EST) library information and EST counts, the application calculates the normalized gene expression levels across a large panel of normal and tumor tissues, thus providing rapid expression profiling for a given gene. The backend GeneHub component of the application contains pre-defined gene structures derived from mRNA transcript sequences from major databases and includes extensive cross references for commonly used gene identifiers. ESTs are then linked to genes based on their precise genomic locations as determined by GMAP. This genome-based approach reduces incorrect matches between ESTs and genes, thus minimizing the noise seen with previous tools. In addition, the gene-centric design makes it possible to add several important features, including text searching capabilities, the ability to accept diverse input values, expression analysis for microRNAs, basic gene annotation, batch analysis and linking between mouse and human genes. GeneHub-GEPIS is available at http://www.cgl.ucsf.edu/Research/genentech/genehub-gepis/ or http://www.gepis.org/

CiteSeerX

Crossref

PubMed Central

Algae Bioreactor Using Submerged Enclosures with Semi-Permeable Membranes

Author: Baertsch Robert
Buckwalter Patrick W
Delzeit Lance D
Embaye Tsegereda N
Flynn Michael T
Gormly Sherwin J
Liggett Travis A
Trent Jonathan D
Publication venue
Publication date
Field of study

Methods for producing hydrocarbons, including oil, by processing algae and/or other micro-organisms in an aquatic environment. Flexible bags (e.g., plastic) with CO.sub.2/O.sub.2 exchange membranes, suspended at a controllable depth in a first liquid (e.g., seawater), receive a second liquid (e.g., liquid effluent from a "dead zone") containing seeds for algae growth. The algae are cultivated and harvested in the bags, after most of the second liquid is removed by forward osmosis through liquid exchange membranes. The algae are removed and processed, and the bags are cleaned and reused

NASA Technical Reports Server

Forces Shaping the Fastest Evolving Regions in the Human Genome

Author: Adam Siepel
Andrew D Kern
Bryan King
Chimpanzee Sequencing and Analysis Consortium
David Haussler
ENCODE Project Consortium
Gene Ontology Consortium
Gill Bejerano
International HapMap Consortium
Jakob S Pedersen
Jim Kent
Kate R Rosenbloom
Katherine S Pollard
Molly Przeworski
Rat Genome Sequencing Project
Robert Baertsch
Sofie R Salama
Sol Katzman
Tim Dreszer
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

Comparative genomics allow us to search the human genome for segments that were extensively changed in the last ~5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human. These are mostly in non-coding DNA, often near genes associated with transcription and DNA binding. Resequencing confirmed that the five most accelerated elements are dramatically changed in human but not in other primates, with seven times more substitutions in human than in chimp. The accelerated elements, and in particular the top five, show a strong bias for adenine and thymine to guanine and cytosine nucleotide changes and are disproportionately located in high recombination and high guanine and cytosine content environments near telomeres, suggesting either biased gene conversion or isochore selection. In addition, there is some evidence of directional selection in the regions containing the two most accelerated regions. A combination of evolutionary forces has contributed to accelerated evolution of the fastest evolving elements in the human genome

Public Library of Science (PLOS)

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

PubMed Central

Copenhagen University Research Information System

eScholarship - University of California

The Structure of a Rigorously Conserved RNA Element within the SARS Virus Genome

Author: Ares
Batey
Batey
Battiste
Bonnal
Brunger
Bushell
Campanacci
Carter
Cate
Cheong
Correll
Cukras
David Haussler
Deng
Doudna
Dsouza
Egloff
Gautheret
Gendron
Haller Igel
Heus
Huppler
Jonassen
Jones
Jovine
Kean
Manuel Ares
Marchand
Marra
Marv Wickens
Merryman
Michael P Robertson
Murshudov
Nissen
Nyborg
Pan
Pan
Ramos
Robert Baertsch
Rota
Schneider
Sette
Sutton
William G Scott
Wimberly
Winn
Zarembinski
Publication venue: Public Library of Science
Publication date: 28/12/2004
Field of study

We have solved the three-dimensional crystal structure of the stem-loop II motif (s2m) RNA element of the SARS virus genome to 2.7-Å resolution. SARS and related coronaviruses and astroviruses all possess a motif at the 3′ end of their RNA genomes, called the s2m, whose pathogenic importance is inferred from its rigorous sequence conservation in an otherwise rapidly mutable RNA genome. We find that this extreme conservation is clearly explained by the requirement to form a highly structured RNA whose unique tertiary structure includes a sharp 90° kink of the helix axis and several novel longer-range tertiary interactions. The tertiary base interactions create a tunnel that runs perpendicular to the main helical axis whose interior is negatively charged and binds two magnesium ions. These unusual features likely form interaction surfaces with conserved host cell components or other reactive sites required for virus function. Based on its conservation in viral pathogen genomes and its absence in the human genome, we suggest that these unusual structural features in the s2m RNA element are attractive targets for the design of anti-viral therapeutic agents. Structural genomics has sought to deduce protein function based on three-dimensional homology. Here we have extended this approach to RNA by proposing potential functions for a rigorously conserved set of RNA tertiary structural interactions that occur within the SARS RNA genome itself. Based on tertiary structural comparisons, we propose the s2m RNA binds one or more proteins possessing an oligomer-binding-like fold, and we suggest a possible mechanism for SARS viral RNA hijacking of host protein synthesis, both based upon observed s2m RNA macromolecular mimicry of a relevant ribosomal RNA fold

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.National Human Genome Research Institute (U.S.) (Grant number 1U54HG004555-01)Wellcome Trust (London, England) (Grant number WT062023)Wellcome Trust (London, England) (Grant number WT077198

DSpace@MIT

PubMed Central

King's Research Portal